home *** CD-ROM | disk | FTP | other *** search
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- NNNNAAAAMMMMEEEE
- perlre - Perl regular expressions
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- This page describes the syntax of regular expressions in Perl. For a
- description of how to _u_s_e regular expressions in matching operations,
- plus various examples of the same, see m// and s/// in the _p_e_r_l_o_p
- manpage.
-
- The matching operations can have various modifiers. The modifiers that
- relate to the interpretation of the regular expression inside are listed
- below. For the modifiers that alter the behaviour of the operation, see
- the section on _m// in the _p_e_r_l_o_p manpage and the section on _s// in the
- _p_e_r_l_o_p manpage.
-
- i Do case-insensitive pattern matching.
-
- If use locale is in effect, the case map is taken from the current
- locale. See the _p_e_r_l_l_o_c_a_l_e manpage.
-
- m Treat string as multiple lines. That is, change "^" and "$" from
- matching at only the very start or end of the string to the start or
- end of any line anywhere within the string,
-
- s Treat string as single line. That is, change "." to match any
- character whatsoever, even a newline, which it normally would not
- match.
-
- The /s and /m modifiers both override the $* setting. That is, no
- matter what $* contains, /s without /m will force "^" to match only
- at the beginning of the string and "$" to match only at the end (or
- just before a newline at the end) of the string. Together, as /ms,
- they let the "." match any character whatsoever, while yet allowing
- "^" and "$" to match, respectively, just after and just before
- newlines within the string.
-
- x Extend your pattern's legibility by permitting whitespace and
- comments.
-
- These are usually written as "the /x modifier", even though the delimiter
- in question might not actually be a slash. In fact, any of these
- modifiers may also be embedded within the regular expression itself using
- the new (?...) construct. See below.
-
- The /x modifier itself needs a little more explanation. It tells the
- regular expression parser to ignore whitespace that is neither
- backslashed nor within a character class. You can use this to break up
- your regular expression into (slightly) more readable parts. The #
- character is also treated as a metacharacter introducing a comment, just
- as in ordinary Perl code. This also means that if you want real
- whitespace or # characters in the pattern (outside of a character class,
- where they are unaffected by /x), that you'll either have to escape them
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- or encode them using octal or hex escapes. Taken together, these
- features go a long way towards making Perl's regular expressions more
- readable. See the C-comment deletion code in the _p_e_r_l_o_p manpage.
-
- RRRReeeegggguuuullllaaaarrrr EEEExxxxpppprrrreeeessssssssiiiioooonnnnssss
-
- The patterns used in pattern matching are regular expressions such as
- those supplied in the Version 8 regex routines. (In fact, the routines
- are derived (distantly) from Henry Spencer's freely redistributable
- reimplementation of the V8 routines.) See the section on _V_e_r_s_i_o_n _8
- _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s for details.
-
- In particular the following metacharacters have their standard _e_g_r_e_p-ish
- meanings:
-
- \ Quote the next metacharacter
- ^ Match the beginning of the line
- . Match any character (except newline)
- $ Match the end of the line (or before newline at the end)
- | Alternation
- () Grouping
- [] Character class
-
- By default, the "^" character is guaranteed to match at only the
- beginning of the string, the "$" character at only the end (or before the
- newline at the end) and Perl does certain optimizations with the
- assumption that the string contains only one line. Embedded newlines
- will not be matched by "^" or "$". You may, however, wish to treat a
- string as a multi-line buffer, such that the "^" will match after any
- newline within the string, and "$" will match before any newline. At the
- cost of a little more overhead, you can do this by using the /m modifier
- on the pattern match operator. (Older programs did this by setting $*,
- but this practice is now deprecated.)
-
- To facilitate multi-line substitutions, the "." character never matches a
- newline unless you use the /s modifier, which in effect tells Perl to
- pretend the string is a single line--even if it isn't. The /s modifier
- also overrides the setting of $*, in case you have some (badly behaved)
- older code that sets it in another module.
-
- The following standard quantifiers are recognized:
-
- * Match 0 or more times
- + Match 1 or more times
- ? Match 1 or 0 times
- {n} Match exactly n times
- {n,} Match at least n times
- {n,m} Match at least n but not more than m times
-
- (If a curly bracket occurs in any other context, it is treated as a
- regular character.) The "*" modifier is equivalent to {0,}, the "+"
- modifier to {1,}, and the "?" modifier to {0,1}. n and m are limited to
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- integral values less than 65536.
-
- By default, a quantified subpattern is "greedy", that is, it will match
- as many times as possible (given a particular starting location) while
- still allowing the rest of the pattern to match. If you want it to match
- the minimum number of times possible, follow the quantifier with a "?".
- Note that the meanings don't change, just the "greediness":
-
- *? Match 0 or more times
- +? Match 1 or more times
- ?? Match 0 or 1 time
- {n}? Match exactly n times
- {n,}? Match at least n times
- {n,m}? Match at least n but not more than m times
-
- Because patterns are processed as double quoted strings, the following
- also work:
-
- \t tab (HT, TAB)
- \n newline (LF, NL)
- \r return (CR)
- \f form feed (FF)
- \a alarm (bell) (BEL)
- \e escape (think troff) (ESC)
- \033 octal char (think of a PDP-11)
- \x1B hex char
- \c[ control char
- \l lowercase next char (think vi)
- \u uppercase next char (think vi)
- \L lowercase till \E (think vi)
- \U uppercase till \E (think vi)
- \E end case modification (think vi)
- \Q quote (disable) pattern metacharacters till \E
-
- If use locale is in effect, the case map used by \l, \L, \u and \U is
- taken from the current locale. See the _p_e_r_l_l_o_c_a_l_e manpage.
-
- You cannot include a literal $ or @ within a \Q sequence. An unescaped $
- or @ interpolates the corresponding variable, while escaping will cause
- the literal string \$ to be matched. You'll need to write something like
- m/\Quser\E\@\Qhost/.
-
- In addition, Perl defines the following:
-
- \w Match a "word" character (alphanumeric plus "_")
- \W Match a non-word character
- \s Match a whitespace character
- \S Match a non-whitespace character
- \d Match a digit character
- \D Match a non-digit character
-
- A \w matches a single alphanumeric character, not a whole word. To match
-
-
-
- PPPPaaaaggggeeee 3333
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- a word you'd need to say \w+. If use locale is in effect, the list of
- alphabetic characters generated by \w is taken from the current locale.
- See the _p_e_r_l_l_o_c_a_l_e manpage. You may use \w, \W, \s, \S, \d, and \D within
- character classes (though not as either end of a range).
-
- Perl defines the following zero-width assertions:
-
- \b Match a word boundary
- \B Match a non-(word boundary)
- \A Match at only beginning of string
- \Z Match at only end of string (or before newline at the end)
- \G Match only where previous m//g left off (works only with /g)
-
- A word boundary (\b) is defined as a spot between two characters that has
- a \w on one side of it and a \W on the other side of it (in either
- order), counting the imaginary characters off the beginning and end of
- the string as matching a \W. (Within character classes \b represents
- backspace rather than a word boundary.) The \A and \Z are just like "^"
- and "$", except that they won't match multiple times when the /m modifier
- is used, while "^" and "$" will match at every internal line boundary.
- To match the actual end of the string, not ignoring newline, you can use
- \Z(?!\n). The \G assertion can be used to chain global matches (using
- m//g), as described in the section on _R_e_g_e_x_p _Q_u_o_t_e-_L_i_k_e _O_p_e_r_a_t_o_r_s in the
- _p_e_r_l_o_p manpage.
-
- It is also useful when writing lex-like scanners, when you have several
- patterns that you want to match against consequent substrings of your
- string, see the previous reference. The actual location where \G will
- match can also be influenced by using pos() as an lvalue. See the pos
- entry in the _p_e_r_l_f_u_n_c manpage.
-
- When the bracketing construct ( ... ) is used, \<digit> matches the
- digit'th substring. Outside of the pattern, always use "$" instead of
- "\" in front of the digit. (While the \<digit> notation can on rare
- occasion work outside the current pattern, this should not be relied
- upon. See the WARNING below.) The scope of $<digit> (and $`, $&, and $')
- extends to the end of the enclosing BLOCK or eval string, or to the next
- successful pattern match, whichever comes first. If you want to use
- parentheses to delimit a subpattern (e.g., a set of alternatives) without
- saving it as a subpattern, follow the ( with a ?:.
-
- You may have as many parentheses as you wish. If you have more than 9
- substrings, the variables $10, $11, ... refer to the corresponding
- substring. Within the pattern, \10, \11, etc. refer back to substrings
- if there have been at least that many left parentheses before the
- backreference. Otherwise (for backward compatibility) \10 is the same as
- \010, a backspace, and \11 the same as \011, a tab. And so on. (\1
- through \9 are always backreferences.)
-
- $+ returns whatever the last bracket match matched. $& returns the
- entire matched string. ($0 used to return the same thing, but not any
- more.) $` returns everything before the matched string. $' returns
-
-
-
- PPPPaaaaggggeeee 4444
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- everything after the matched string. Examples:
-
- s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words
-
- if (/Time: (..):(..):(..)/) {
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
- }
-
- Once perl sees that you need one of $&, $` or $' anywhere in the program,
- it has to provide them on each and every pattern match. This can slow
- your program down. The same mechanism that handles these provides for
- the use of $1, $2, etc., so you pay the same price for each pattern that
- contains capturing parentheses. But if you never use $&, etc., in your
- script, then patterns _w_i_t_h_o_u_t capturing parentheses won't be penalized.
- So avoid $&, $', and $` if you can, but if you can't (and some algorithms
- really appreciate them), once you've used them once, use them at will,
- because you've already paid the price. As of 5.005, $& is not so costly
- as the other two.
-
- Backslashed metacharacters in Perl are alphanumeric, such as \b, \w, \n.
- Unlike some other regular expression languages, there are no backslashed
- symbols that aren't alphanumeric. So anything that looks like \\, \(,
- \), \<, \>, \{, or \} is always interpreted as a literal character, not a
- metacharacter. This was once used in a common idiom to disable or quote
- the special meanings of regular expression metacharacters in a string
- that you want to use for a pattern. Simply quote all non-alphanumeric
- characters:
-
- $pattern =~ s/(\W)/\\$1/g;
-
- Now it is much more common to see either the _q_u_o_t_e_m_e_t_a() function or the
- \Q escape sequence used to disable all metacharacters' special meanings
- like this:
-
- /$unquoted\Q$quoted\E$unquoted/
-
- Perl defines a consistent extension syntax for regular expressions. The
- syntax is a pair of parentheses with a question mark as the first thing
- within the parentheses (this was a syntax error in older versions of
- Perl). The character after the question mark gives the function of the
- extension. Several extensions are already supported:
-
- (?#text) A comment. The text is ignored. If the /x switch is used to
- enable whitespace formatting, a simple # will suffice.
-
- (?:pattern)
- This is for clustering, not capturing; it groups subexpressions
- like "()", but doesn't make backreferences as "()" does. So
-
- @fields = split(/\b(?:a|b|c)\b/)
-
-
-
- PPPPaaaaggggeeee 5555
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- is like
-
- @fields = split(/\b(a|b|c)\b/)
-
- but doesn't spit out extra fields.
-
- (?=pattern)
- A zero-width positive lookahead assertion. For example,
- /\w+(?=\t)/ matches a word followed by a tab, without including
- the tab in $&.
-
- (?!pattern)
- A zero-width negative lookahead assertion. For example
- /foo(?!bar)/ matches any occurrence of "foo" that isn't
- followed by "bar". Note however that lookahead and lookbehind
- are NOT the same thing. You cannot use this for lookbehind.
-
- If you are looking for a "bar" that isn't preceded by a "foo",
- /(?!foo)bar/ will not do what you want. That's because the
- (?!foo) is just saying that the next thing cannot be "foo"--and
- it's not, it's a "bar", so "foobar" will match. You would have
- to do something like /(?!foo)...bar/ for that. We say "like"
- because there's the case of your "bar" not having three
- characters before it. You could cover that this way:
- /(?:(?!foo)...|^.{0,2})bar/. Sometimes it's still easier just
- to say:
-
- if (/bar/ && $` !~ /foo$/)
-
-
- (?imstx) One or more embedded pattern-match modifiers. This is
- particularly useful for patterns that are specified in a table
- somewhere, some of which want to be case sensitive, and some of
- which don't. The case insensitive ones need to include merely
- (?i) at the front of the pattern. For example:
-
- $pattern = "foobar";
- if ( /$pattern/i )
-
- # more flexible:
-
- $pattern = "(?i)foobar";
- if ( /$pattern/ )
-
-
- A question mark was chosen for this and for the new minimal-matching
- construct because 1) question mark is pretty rare in older regular
- expressions, and 2) whenever you see one, you should stop and "question"
- exactly what is going on. That's psychology...
-
-
-
-
-
-
- PPPPaaaaggggeeee 6666
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- BBBBaaaacccckkkkttttrrrraaaacccckkkkiiiinnnngggg
-
- A fundamental feature of regular expression matching involves the notion
- called _b_a_c_k_t_r_a_c_k_i_n_g, which is currently used (when needed) by all regular
- expression quantifiers, namely *, *?, +, +?, {n,m}, and {n,m}?.
-
- For a regular expression to match, the _e_n_t_i_r_e regular expression must
- match, not just part of it. So if the beginning of a pattern containing
- a quantifier succeeds in a way that causes later parts in the pattern to
- fail, the matching engine backs up and recalculates the beginning part--
- that's why it's called backtracking.
-
- Here is an example of backtracking: Let's say you want to find the word
- following "foo" in the string "Food is on the foo table.":
-
- $_ = "Food is on the foo table.";
- if ( /\b(foo)\s+(\w+)/i ) {
- print "$2 follows $1.\n";
- }
-
- When the match runs, the first part of the regular expression (\b(foo))
- finds a possible match right at the beginning of the string, and loads up
- $1 with "Foo". However, as soon as the matching engine sees that there's
- no whitespace following the "Foo" that it had saved in $1, it realizes
- its mistake and starts over again one character after where it had the
- tentative match. This time it goes all the way until the next occurrence
- of "foo". The complete regular expression matches this time, and you get
- the expected output of "table follows foo."
-
- Sometimes minimal matching can help a lot. Imagine you'd like to match
- everything between "foo" and "bar". Initially, you write something like
- this:
-
- $_ = "The food is under the bar in the barn.";
- if ( /foo(.*)bar/ ) {
- print "got <$1>\n";
- }
-
- Which perhaps unexpectedly yields:
-
- got <d is under the bar in the >
-
- That's because .* was greedy, so you get everything between the _f_i_r_s_t
- "foo" and the _l_a_s_t "bar". In this case, it's more effective to use
- minimal matching to make sure you get the text between a "foo" and the
- first "bar" thereafter.
-
- if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
- got <d is under the >
-
- Here's another example: let's say you'd like to match a number at the end
- of a string, and you also want to keep the preceding part the match. So
-
-
-
- PPPPaaaaggggeeee 7777
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- you write this:
-
- $_ = "I have 2 numbers: 53147";
- if ( /(.*)(\d*)/ ) { # Wrong!
- print "Beginning is <$1>, number is <$2>.\n";
- }
-
- That won't work at all, because .* was greedy and gobbled up the whole
- string. As \d* can match on an empty string the complete regular
- expression matched successfully.
-
- Beginning is <I have 2 numbers: 53147>, number is <>.
-
- Here are some variants, most of which don't work:
-
- $_ = "I have 2 numbers: 53147";
- @pats = qw{
- (.*)(\d*)
- (.*)(\d+)
- (.*?)(\d*)
- (.*?)(\d+)
- (.*)(\d+)$
- (.*?)(\d+)$
- (.*)\b(\d+)$
- (.*\D)(\d+)$
- };
-
- for $pat (@pats) {
- printf "%-12s ", $pat;
- if ( /$pat/ ) {
- print "<$1> <$2>\n";
- } else {
- print "FAIL\n";
- }
- }
-
- That will print out:
-
- (.*)(\d*) <I have 2 numbers: 53147> <>
- (.*)(\d+) <I have 2 numbers: 5314> <7>
- (.*?)(\d*) <> <>
- (.*?)(\d+) <I have > <2>
- (.*)(\d+)$ <I have 2 numbers: 5314> <7>
- (.*?)(\d+)$ <I have 2 numbers: > <53147>
- (.*)\b(\d+)$ <I have 2 numbers: > <53147>
- (.*\D)(\d+)$ <I have 2 numbers: > <53147>
-
- As you see, this can be a bit tricky. It's important to realize that a
- regular expression is merely a set of assertions that gives a definition
- of success. There may be 0, 1, or several different ways that the
- definition might succeed against a particular string. And if there are
- multiple ways it might succeed, you need to understand backtracking to
-
-
-
- PPPPaaaaggggeeee 8888
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- know which variety of success you will achieve.
-
- When using lookahead assertions and negations, this can all get even
- tricker. Imagine you'd like to find a sequence of non-digits not
- followed by "123". You might try to write that as
-
- $_ = "ABC123";
- if ( /^\D*(?!123)/ ) { # Wrong!
- print "Yup, no 123 in $_\n";
- }
-
- But that isn't going to match; at least, not the way you're hoping. It
- claims that there is no 123 in the string. Here's a clearer picture of
- why it that pattern matches, contrary to popular expectations:
-
- $x = 'ABC123' ;
- $y = 'ABC445' ;
-
- print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;
- print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;
-
- print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;
- print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;
-
- This prints
-
- 2: got ABC
- 3: got AB
- 4: got ABC
-
- You might have expected test 3 to fail because it seems to a more general
- purpose version of test 1. The important difference between them is that
- test 3 contains a quantifier (\D*) and so can use backtracking, whereas
- test 1 will not. What's happening is that you've asked "Is it true that
- at the start of $x, following 0 or more non-digits, you have something
- that's not 123?" If the pattern matcher had let \D* expand to "ABC",
- this would have caused the whole pattern to fail. The search engine will
- initially match \D* with "ABC". Then it will try to match (?!123 with
- "123", which of course fails. But because a quantifier (\D*) has been
- used in the regular expression, the search engine can backtrack and retry
- the match differently in the hope of matching the complete regular
- expression.
-
- The pattern really, _r_e_a_l_l_y wants to succeed, so it uses the standard
- pattern back-off-and-retry and lets \D* expand to just "AB" this time.
- Now there's indeed something following "AB" that is not "123". It's in
- fact "C123", which suffices.
-
- We can deal with this by using both an assertion and a negation. We'll
- say that the first part in $1 must be followed by a digit, and in fact,
- it must also be followed by something that's not "123". Remember that
- the lookaheads are zero-width expressions--they only look, but don't
-
-
-
- PPPPaaaaggggeeee 9999
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- consume any of the string in their match. So rewriting this way produces
- what you'd expect; that is, case 5 will fail, but case 6 succeeds:
-
- print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;
- print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;
-
- 6: got ABC
-
- In other words, the two zero-width assertions next to each other work as
- though they're ANDed together, just as you'd use any builtin assertions:
- /^$/ matches only if you're at the beginning of the line AND the end of
- the line simultaneously. The deeper underlying truth is that
- juxtaposition in regular expressions always means AND, except when you
- write an explicit OR using the vertical bar. /ab/ means match "a" AND
- (then) match "b", although the attempted matches are made at different
- positions because "a" is not a zero-width assertion, but a one-width
- assertion.
-
- One warning: particularly complicated regular expressions can take
- exponential time to solve due to the immense number of possible ways they
- can use backtracking to try match. For example this will take a very
- long time to run
-
- /((a{0,5}){0,5}){0,5}/
-
- And if you used *'s instead of limiting it to 0 through 5 matches, then
- it would take literally forever--or until you ran out of stack space.
-
- VVVVeeeerrrrssssiiiioooonnnn 8888 RRRReeeegggguuuullllaaaarrrr EEEExxxxpppprrrreeeessssssssiiiioooonnnnssss
-
- In case you're not familiar with the "regular" Version 8 regex routines,
- here are the pattern-matching rules not described above.
-
- Any single character matches itself, unless it is a _m_e_t_a_c_h_a_r_a_c_t_e_r with a
- special meaning described here or above. You can cause characters that
- normally function as metacharacters to be interpreted literally by
- prefixing them with a "\" (e.g., "\." matches a ".", not any character;
- "\\" matches a "\"). A series of characters matches that series of
- characters in the target string, so the pattern blurfl would match
- "blurfl" in the target string.
-
- You can specify a character class, by enclosing a list of characters in
- [], which will match any one character from the list. If the first
- character after the "[" is "^", the class matches any character not in
- the list. Within a list, the "-" character is used to specify a range,
- so that a-z represents all characters between "a" and "z", inclusive. If
- you want "-" itself to be a member of a class, put it at the start or end
- of the list, or escape it with a backslash. (The following all specify
- the same class of three characters: [-az], [az-], and [a\-z]. All are
- different from [a-z], which specifies a class containing twenty-six
- characters.)
-
-
-
-
- PPPPaaaaggggeeee 11110000
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- Characters may be specified using a metacharacter syntax much like that
- used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
- "\f" a form feed, etc. More generally, \_n_n_n, where _n_n_n is a string of
- octal digits, matches the character whose ASCII value is _n_n_n. Similarly,
- \x_n_n, where _n_n are hexadecimal digits, matches the character whose ASCII
- value is _n_n. The expression \c_x matches the ASCII character control-_x.
- Finally, the "." metacharacter matches any character except "\n" (unless
- you use /s).
-
- You can specify a series of alternatives for a pattern using "|" to
- separate them, so that fee|fie|foe will match any of "fee", "fie", or
- "foe" in the target string (as would f(e|i|o)e). The first alternative
- includes everything from the last pattern delimiter ("(", "[", or the
- beginning of the pattern) up to the first "|", and the last alternative
- contains everything from the last "|" to the next pattern delimiter. For
- this reason, it's common practice to include alternatives in parentheses,
- to minimize confusion about where they start and end.
-
- Alternatives are tried from left to right, so the first alternative found
- for which the entire expression matches, is the one that is chosen. This
- means that alternatives are not necessarily greedy. For example: when
- mathing foo|foot against "barefoot", only the "foo" part will match, as
- that is the first alternative tried, and it successfully matches the
- target string. (This might not seem important, but it is important when
- you are capturing matched text using parentheses.)
-
- Also remember that "|" is interpreted as a literal within square
- brackets, so if you write [fee|fie|foe] you're really only matching
- [feio|].
-
- Within a pattern, you may designate subpatterns for later reference by
- enclosing them in parentheses, and you may refer back to the _nth
- subpattern later in the pattern using the metacharacter \_n. Subpatterns
- are numbered based on the left to right order of their opening
- parenthesis. A backreference matches whatever actually matched the
- subpattern in the string being examined, not the rules for that
- subpattern. Therefore, (0|0x)\d*\s\1\d* will match "0x1234 0x4321", but
- not "0x1234 01234", because subpattern 1 actually matched "0x", even
- though the rule 0|0x could potentially match the leading 0 in the second
- number.
-
- WWWWAAAARRRRNNNNIIIINNNNGGGG oooonnnn \\\\1111 vvvvssss $$$$1111
-
- Some people get too used to writing things like:
-
- $pattern =~ s/(\W)/\\\1/g;
-
- This is grandfathered for the RHS of a substitute to avoid shocking the
- sssseeeedddd addicts, but it's a dirty habit to get into. That's because in
- PerlThink, the righthand side of a s/// is a double-quoted string. \1 in
- the usual double-quoted string means a control-A. The customary Unix
- meaning of \1 is kludged in for s///. However, if you get into the habit
-
-
-
- PPPPaaaaggggeeee 11111111
-
-
-
-
-
-
- PPPPEEEERRRRLLLLRRRREEEE((((1111)))) PPPPEEEERRRRLLLLRRRREEEE((((1111))))
-
-
-
- of doing that, you get yourself into trouble if you then add an /e
- modifier.
-
- s/(\d+)/ \1 + 1 /eg; # causes warning under -w
-
- Or if you try to do
-
- s/(\d+)/\1000/;
-
- You can't disambiguate that by saying \{1}000, whereas you can fix it
- with ${1}000. Basically, the operation of interpolation should not be
- confused with the operation of matching a backreference. Certainly they
- mean two different things on the _l_e_f_t side of the s///.
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
-
- the section on _R_e_g_e_x_p _Q_u_o_t_e-_L_i_k_e _O_p_e_r_a_t_o_r_s in the _p_e_r_l_o_p manpage.
-
- the pos entry in the _p_e_r_l_f_u_n_c manpage.
-
- the _p_e_r_l_l_o_c_a_l_e manpage.
-
- _M_a_s_t_e_r_i_n_g _R_e_g_u_l_a_r _E_x_p_r_e_s_s_i_o_n_s (see the _p_e_r_l_b_o_o_k manpage) by Jeffrey
- Friedl.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- PPPPaaaaggggeeee 11112222
-
-
-
-